Features of Nearest Neighbors Distances in High-Dimensional Space
نویسنده
چکیده
Methods of nearest neighbors are essential in wide range of applications where it is necessary to estimate probability density (e.g. Bayes’s classifier, problems of searching in large databases). This paper contemplates on features of distribution of nearest neighbors’ distances in high-dimensional spaces. It shows that for uniform distribution of points in n-dimensional Euclidean space the distribution of the distance of the i-th nearest neighbor to the n-power has Erlang distribution. A power approximation of the newly introduced probability distribution mapping function of distances of nearest neighbors in the form of suitable power of the distance is presented. An influence of the boundary effect is also discussed. Also presented is way to state distribution mapping exponent q for a probability density estimation including boundary effect in highdimensional spaces.
منابع مشابه
Classification of Chronic Kidney Disease Patients via k-important Neighbors in High Dimensional Metabolomics Dataset
Background: Chronic kidney disease (CKD), characterized by progressive loss of renal function, is becoming a growing problem in the general population. New analytical technologies such as “omics”-based approaches, including metabolomics, provide a useful platform for biomarker discovery and improvement of CKD management. In metabolomics studies, not only prediction accuracy is ...
متن کاملRNN (Reverse Nearest Neighbour) in Unproven Reserve Based Outlier Discovery
Outlier detection refers to task of identifying patterns. They don’t conform establish regular behavior. Outlier detection in highdimensional data presents various challenges resulting from the “curse of dimensionality”. The current view is that distance concentration that is tendency of distances in high-dimensional data to become in discernible making distance-based methods label all points a...
متن کاملHyperspectral Image Classification Based on the Fusion of the Features Generated by Sparse Representation Methods, Linear and Non-linear Transformations
The ability of recording the high resolution spectral signature of earth surface would be the most important feature of hyperspectral sensors. On the other hand, classification of hyperspectral imagery is known as one of the methods to extracting information from these remote sensing data sources. Despite the high potential of hyperspectral images in the information content point of view, there...
متن کاملHubs in Space: Popular Nearest Neighbors in High-Dimensional Data
Different aspects of the curse of dimensionality are known to present serious challenges to various machine-learning methods and tasks. This paper explores a new aspect of the dimensionality curse, referred to as hubness, that affects the distribution of k-occurrences: the number of times a point appears among the k nearest neighbors of other points in a data set. Through theoretical and empiri...
متن کاملUsing Triangle Inequality to Efficiently Process Continuous Queries on High-Dimensional Streaming Time Series
In many applications, it is important to quickly find, from a database of patterns, the nearest neighbors of highdimensional query points that come into the system in a streaming form. Treating each query point as a separate one is inefficient. Consecutive query points are often neighbors in the high-dimensional space, and intermediate results in the processing of one query should help the proc...
متن کامل